Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency

نویسنده

  • Mathias Creutz
چکیده

We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. Our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (Finnish), and to perform well also on English data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages

Morph length is one of the indicative feature that helps learning the morphology of languages, in particular agglutinative languages. In this paper, we introduce a simple unsupervised model for morphological segmentation and study how the knowledge of morph length affect the performance of the segmentation task under the Bayesian framework. The model is based on (Goldwater et al., 2006) unigram...

متن کامل

Unsupervised Texture Image Segmentation Using MRFEM Framework

Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...

متن کامل

Unsupervised Texture Image Segmentation Using MRFEM Framework

Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...

متن کامل

Adaptive Segmentation with Optimal Window Length Scheme using Fractal Dimension and Wavelet Transform

In many signal processing applications, such as EEG analysis, the non-stationary signal is often required to be segmented into small epochs. This is accomplished by drawing the boundaries of signal at time instances where its statistical characteristics, such as amplitude and/or frequency, change. In the proposed method, the original signal is initially decomposed into signals with different fr...

متن کامل

Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation

Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003